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RAPIDLY COMPUTING SPARSE LEGENDRE EXPANSIONS VIA SPARSE 

FOURIER TRANSFORMS 

XIANFENG HU, MARK IWEN, AND HYEJIN KIM 


Abstract. In this paper we propose a general strategy for rapidly computing sparse Legendre 
expansions. The resulting methods yield a new class of fast algorithms capable of approximating a 
given function / : [—1,1] —>-11 with a near-optimal linear combination of s Legendre polynomials 
of degree < A in just (slog Aj^^'^^-time. When s <C A these algorithms exhibit sublinear runtime 
complexities in A, as opposed to traditional II(A log A)-time methods for computing all of the first 
A Legendre coefficients of /. Theoretical as well as numerical results demonstrate the effectiveness 
of the proposed methods. 


1. Introduction 

In this paper we consider Legendre-compressible functions which can be well approximated by a 
linear combination of a small number of unknown, and potentially high-degree, Legendre polynomi¬ 
als. Given such a function our objective is to quickly learn the best basis of Legendre polynomials 
with which to approximate it, and then to compute their coefficients. Let / ; [—1,1] ^ IR be a 
degree N polynomial, and L„(x) denote the Legendre polynomial of degree n. We aim to rapidly 
and accurrately compute /’s Legendre coefficients, f{n) € IR for n G {0, ... ,N} with 

A 

(1) /(a;) = 

n=0 

whenever /(n) ~ 0 for all but s N initially unknown values of n. We will call any numerical 
method with this objective a sparse Legendre expansion algorithm. 

Note that solving this problem is straightforward if one is willing to sample / at A -|- 1 points in 
[—1,1], and then compute all A -|- 1 of its Legendre coefficients. However, any such approach will 
necessarily require H(A)-operations, which can become overwhelming when the maximal degree, 
A, of / is large. Our objective here is to select the best basis of s <C A Legendre polynomials of 
degree < A for /, and then estimate their coefficients, in (s log Aj^^^^-time. When s is significantly 
smaller than A, these methods will be faster than any traditional approach which computes all A 
Legendre coefficients of /. Fast sparse Legendre expansion algorithms of this kind are a natural first 
step toward the development of computationally tractable algorithms for approximating functions of 
many variables with respect to tensorized Legendre polynomial bases. In such multivariate problems 
the maximal degree. A, grows exponentially in the number of variables, rapidly rendering even 
0(A)-time methods impractical. Our longterm goal is to extend (s log A)®(^^-time sparse expansion 
methods for functions one variable, once they are properly understood, to the multi-variate setting. 
If possible, the resulting methods would be of value in many computational applications including, 
e.g., uncertainty quantification [32] and the computation of polynomial chaos expansions 
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The majority of previously proposed sparse Legendre expansion methods are based on Prony- 
like approaches. Examples include results by Peter et. al. [36] who develop a method based on 
a more general approach from |35j which needs only 0(s) samples from (various derivatives of) 
/ in order to recover its s-sparse Legendre expansion. More recently, Potts and Tasche [38] used 
approximation techniques to adapt previous Prony-like methods for the recovery of Chebyshev- 
sparse functions [39| to the Legendre-sparse setting. However, no theoretical results are proven in 
|38j that demonstrate the methods therein can extend to functions with compressible (as opposed 
to exactly s-sparse) Legendre expansions, nor is it proven that they can tolerate even modest 
levels of noise in general. In contrast, herein we provide a theoretical support recovery guarantee 
which proves that our techniques can indeed locate the principle support of a relatively large class of 
Legendre-compressible functions (see, e.g.. Theorem [5]). When combined with coefficient estimation 
methods based on techniques from compressive sensing (see, e.g.. Lemma Hj) these support recovery 
guarantees allow one to prove a variety of general sublinear-time recovery guarantees for functions 
with compressible Legendre expansions. 

Other sparse Legendre expansion methods include those based on compressive sensing approaches 
m In particular, Rauhut and Ward [40] demonstrate that 0{s ■ log^iV) samples from / suffice in 
order to accurately and stably approximate / with a near-optimal sparse Legendre expansion. The 
associate reconstruction algorithms are n(A^)-time, however. As opposed to these previous sparse 
Legendre expansion methods, we propose a new approach motivated by a recently proposed FFT- 
based algorithm for computing all iV-|- 1 Legendre coefficients of a given function / in O(A^logiV)- 
time. In the process, we demonstrate a general approach which allows one to utilize any Sparse 
Fourier Transform algorithm (see, e.g., [HlESl EU143]) one desires in order to recover functions 
which are sparse/compressible in other polynomial bases (herein, Chebyschev and Legendre). 

Computing all A^-|-l Legendre coefficients of / in o(A^^)-time is itself a challenging problem which 
has attracted a good deal of attention over the past two decades. Proposed methods include, e.g., 
fast multipole-like approaches [3], and algorithms based on integral transform techniques m , to 
mention just a few. These methods are 0{N log'^ A^)-time for various c G R'*'. Most pertinent to the 
sparse Legendre expansion algorithms proposed herein, however, are recent FFT-based algorithms 
for computing all N Legendre coefficients in 0(A^log A^)-time [26]. These methods work by (i) 
implicitly mapping the Legendre coefficients of / to the Fourier coefficients of a related function, 
fr, which remains easily to sample, followed by (ii) computing the Fourier coefficients of fr with an 
FFT, and then {iii) using the computed Fourier coefficients of fr in order to recover the Legendre 
coefficients of the original function / (i.e., by inverting the map from (i)). The sparse Legendre 
expansion methods proposed herein are based on this same type of approach. In particular, the 
algorithms proposed herein result from combining ideas from Iserles’ FFT-based Legendre algorithm 
[26] (summarized below in ^2.2p with sparse Fourier transform techniques (briefly discussed in the 
next section). 


1.1. Sparse Fourier Transforms. Sparse Fourier Transforms (SFTs) are algorithms for quickly 
computing near-optimal sparse approximations to the Fourier series of a given periodic function 
/ : [—TT, tt]^ —)■ C. Suppose that / is a trigonometric polynomial of degree N in every variable so 
that the Fourier series of / is effectively / G . An optimal s-term trigonometric approximation 
to / is given by 


/r(x) := 

i=i 

2 


( 2 ) 


where a;i,..., Uj^d G {—N/2, N/2]^ri7j^ are ordered by the magnitudes of their Fourier coefficients, 
/ (uj) G C, so that 

(3) |/(^i)| > \ f{^2)\ > ■> \ f{uj^D)\. 

The optimal s-term approximation error is then ||/ — /°’’*'||2 = 11/ ~ /°^*|| 2 - In this setting, any 
discrete Fourier method will take function evaluations of / as input, and then output an approximate 
for some value of s G {0,..., A standard FFT always uses s = N^, and so recovers 

trigonometric polynomials exactly. Sparse FFTs allow s to be chosen independently of N^, and 
exactly recover all trigonometric polynomials consisting of at most s nonzero terms. 

The primary objective of SFTs is to compute an accurate approximation to /, and therefore 
to / itself, as quickly as absolutely possible. This has lead to the development of a wide range 
of algorithms for approximating degree N trigonometric polynomials which use (s • log 
floating point operations, where s is the user specified sparsity parameter. In order to achieve these 
operation counts, all such SFTs can utilize at most (sdog function evaluations from / during 

their execution. It is interesting to note, for the purposes of comparison, that SFTs are therefore 
closely related to Fourier-based compressed sensing techniques [inillSlIIlKllllMKlT] whose primary 
objective is to approximate / using as few function evaluations, or samples, as absolutely possible 
(see, e.g., m)- However, although the relationship between SFTs and compressed sensing is very 
close, SFTs generally utilize more samples than the best compressed sensing methods in practice. 
Similarly, even the fastest Fourier-based compressed sensing methods are too slow to serve as SFTs 
since SFTs purposefully exceed the strictest sampling requirements of compressed sensing methods 
in order to reduce their runtime complexities as much as possible. 

The first sparse Fourier methods were essentially approximate Hadamard transforms that were 
developed by researchers in the machine learning community for quickly learning boolean functions 
of many variables (see, e.g., [30l[6l[33] and [MIES]). These techniques were later adapted to produce 
randomized SFTs for approximating trigonometric polynomials as rapidly as possible [Ml [El El El]. 
These subsequent SFTs all take random samples of a given periodic function / as input, and then 
output a trigonometric polynomial, y, of degree N which satisfies ||/ — y ||2 ~ ||/ — /°’’*||2 with high 
probability. The fastest of these SFTs [21] uses only s -log^^^^^ N operations. As a result, it is faster 
than the FFT for accurately approximating periodic functions which are dominated by s ^ A^ of 
their largest magnitude Fourier coefficients [29| . 

Over the last several years SFTs have been improved significantly in both theory and practice. 
Recent work includes better implementations [251 03], improvements in runtime complexity 

bounds (both upper and lower) [Ml EH EE], adaptation of the methods to the recovery of su¬ 
perpositions of sinusoids with non-integer frequencies [8], and improvements in theoretical error 
guarantees PEZmSlES]. In particular, entirely deterministic SFTs exist [271 EE] that are guaran¬ 
teed to always return a near-optimal sparse trigonometric polynomial, ys ■ [—'tTjTt]'^ —)• C, having 
||/~ys ||2 ~ ll/“ /°*^*|| 2 . More specifically, the following theorem was proven in [28] . 




Theorem 1. Suppose f : [—7r,7r]^ — )• C has /(wi,... ,u]d) = 0 if (wi,... ,ujd) ^ ([~T) tI Z) 
Let s,e~^ G N \ {1} with (s/e)^ > 4. Then, there exists a simple deterministic algorithm that is 


guaranteed to output a trigonometric polynomial, ys : 


-TT, TT 


D 


C, satisfying 


(4) 11/ ys|l2 — 

The algorithm’s operation count is 

(5) O 
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's'^ ■ D* ■log'^{ND)\ 
log (f) • e2 ) 
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If succeeding with probability (1 — 5) G [2/3,1) is sufficient, and (s/e) >2, a Monte Carlo variant 
of the deterministic algorithm may be used. This Monte Carlo variant will output a trigonometric 
polynomial, ps : [—7r,7r]^ —>■ C, that satisfies Equation^ with probability at least 1 — 5. Its operation 
count will be 


( 6 ) 


O 


/ g . n4 

f -• log^ (ND) ■ log 



The Fourier algorithms referred to by Theorem [T] are able to accurately approximate the discrete 
Fourier transform of a given function much more quickly than standard Fast Fourier Transform 
(FFT) methods [9] whenever the sorted magnitudes of the Fourier coefficients ([3]) go to zero quickly 
enough [53]. More specifically, the developed Fourier approximation algorithms have operation 
counts that scale polynomially in D and log A^, as opposed to standard FFT methods whose oper¬ 
ation counts scale exponentially in D and log N. We direct the reader to [20] for a recent survey of 
SFT techniques, as well as for an easy introduction to their design and implementation. 


1.2. Our Proposed Sparse Legendre Expansion Method. The thought behind our proposed 
sparse Legendre expansion approach is naively simple at first glance. One thinks: “FFT-based 
algorithms for computing all N Legendre coefficients like [26| appear to work well. Maybe SETs can 
replace the FFTs in these methods in order to allow us to rapidly approximate Legendre-compressible 
functions!” Of course, a multitude of technical difficulties present themselves almost immediately. 
Mainly, mapping the sparse Legendre coefficients of / to a set of Fourier coefficients of a related 
function, fr, is only helpful for computing sparse Legendre expansions if the map preserves sparsity. 
The speed and accuracy of SFT methods depend on the sparsity of the function to which they are 
applied. If the sparse Legendre coefficients of / don’t map to a set of Fourier coefficients of fr that 
are also fairly sparse, then SFT methods will not be able to approximate fr quickly enough to be 
interesting. Furthermore, and perhaps more obviously, the map from the Legendre coefficients of / 
to the Fourier coefficients of fr must be fairly well-conditioned in the £oo-sense. If the map sends a 
few of the large-magnitude Legendre coefficients of / to Fourier coefficients of comparatively tiny 
magnitude, then we will have difficultly recovering them with an SFT. Finally, the inverse map from 
the computed Fourier coefficients of fr back to the Legendre coefficients of / must be fast (e.g., 
well approximated by a sparse matrix multiply). If not, we will not be able to use our computed 
Fourier coefficients of fr in order to compute the Legendre coefficients of the original function / 
quickly enough to be of interest. 

Unfortunately, achieving all of these properties at once appears to be quite difficult. Herein 
we take advantage of the fact that Iserles’ map from Legendre coefficients to Fourier coefficients 
is fairly well-behaved with respect to sparsity (see ^2.21 and In particular, we show that 

Legendre-sparse functions are mapped to Fourier-compressible functions (i.e., the map preserves 
sparsity fairly well). Unfortunately, it appears to be impossible to force the map to also be both 
well-conditioned in the £oo-sense, and quickly invertible. To compensate for this defect we modify 
our initial idea and employ a two-stage approach instead: We first use a “pretty well-conditioned” 
version of Iserles’ map in combination with SFT methods in order to rapidly identify the large- 
magnitude Legendre coefficients in /, and then use results form compressive sensing in order to 
accurately approximate the identified Legendre coefficients. Doing so allows us to develop workable 
sparse Legendre expansion algorithms which run in sublinear-time for sufficiently small sparsities. 
We refer the reader to ^for additional details, and to the next section for a related example. 
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1.3. A Simple Example: Recovering Sparse Chebyshev Expansions via SETs. In this 
section we briefly consider Chebyshev-sparse functions of the form 

9ix) = ^ anTnix), 5 C {0,..., N}, |5| = s < N, 

n£S 


where Tn{x) denotes the degree n Chebyshev polynomial of the first kind. In this setting it is 
beneficial to consider the standard transformation h = g{cos x). It is well known that this transfor¬ 
mation implicitly maps the Chebyshev coefficient of g to the Fourier cosine series coefficient 
of h (see, e.g., [9]). In particular, we have that 

h{x) = ^ UnTnicos x) = ^ o„ cos(nx) = ^ ^ . 

nG<S 


It is now straightforward to see that creating h by resampling g according to the cosine function 
implicitly produces a sparsity-preserving linear map from the Chebyshev coefficients of g to the 
positive Fourier coefficients of h, 


(7) 
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Note that the map above ([7|) has all of the desirable properties mentioned in section 11.21 It is 
exactly sparsity preserving, well conditioned, and trivially invertible. As a consequence, one can 
easily compute the sparse Chebyshev expansion of g using SFTs. One simply applies the SFT of 
their choice to h and then reconstructs the Chebyshev coefficients of g from the result using their 
knowledge of ([7|). One goal of the research initiated here is to find a good sparsity-preserving map 
similar to dT]) for use with Legendre polynomials, if possible. In this paper we make a first attempt 
toward this goal by analyzing the maps proposed by Iserles in [26j . 


2. Notation and Background 

We will denote the Legendre polynomial of degree n by The Legendre coefficient of 
/ : [—1,1] —)• M is then 

(8) /(n) ;= + ^ j ^f{x)Lnix) dx 

for each n € N. The Fourier series coefficients of a function / : [—vr, tt] —)■ C will be denoted by 

( 9 ) f\u;):=^£ 

for all oj £ Tj. The sequence of Legendre or Fourier coefficients of an appropriate function / will be 
called / or /, respectively. 

For any matrix X G ([^Nxn denote the column of X by Xj G C^. The adjoint 

of a matrix, X G will be denoted by X* G and the singular values of any matrix 

X G will always be ordered as cri{X) > a2{X) > • • • > o'njin(7v,n)(Ai) > 0. Also, the condition 

number of the matrix X will denoted by k(A) := (Ti(A)/(Tinin(Ar,n)(Ai). We will use the notation 

[N] := {0,... ,N} C N for any A G N. For any matrix X G and set S C [n] the matrix 

5 






Xs € will be the submatrix of X formed by selecting the columns of X indexed by 5. 

Similarly, for any vector x E and set S C [n] the vector X 5 E will have entries 


{xs)j 


0 , iij^S 

Xj, if j e S 


Finally, given any x E C'^, the vector Xs°’’* E will always denote an optimal s-sparse ap¬ 
proximation to X. That is, Xs°’’*' will always (i) have at most s E [A^] nonzero entries, and (ii) 
satisfy 


( 10 ) 


X — X 


opt 11 _ 


zeC'*, 


inf 


X — z 


z||oS 


<s 


for all p > 1 . 


2.1. Bounded Orthonormal Systems. Let T> C R” be endowed with a probability measure /i. 
Further, let T = {ipo ,..., V’Af} be an orthonormal set of real-valued functions on D so that 

/ )dfJ. (x) = 6ij. 

Jv 

We will refer to any such ^ as an orthonormal system. More specifically, we will utilize a particular 
type of orthonormal system. 

Definition 1. We call T = {V’o, • • • ,'07v} a bounded orthonormal system with constant K E if 

llV'fcIloo •= sup IV' ( X )| < K for all k E [N]. 

X e 


For any orthonormal system, 'L, on D C R” with probability measure p, we may create an 
associated random sampling matrix, R E , as follows: First, select m -|- 1 points 

xq, ... ,Xm E R independently at random according to Then, form the matrix R by setting 
Ri,j ■= 'f’j i ) for each i E [m] and j E [N], The following theorem concerning random sampling 
matrices created from bounded orthonormal systems in this fashion is proven in U7]E 

Theorem 2. Let R E a random sampling matrix created from a bounded or¬ 
thonormal system with constant K > 1. Let e E (0,1), s E [A^] \ {0}, and set R = If 

m > CK'^€~‘^sln^{N), then with probability at least 1 — 

(11) Vf-e < cj|5| (^Rs'j < cTi (^Rs^ < vTTe 

will hold simultaneously for all nonempty subsets S C [A^] having jSI < s . Here the constant C > 0 
is fixed and universal. 

As pointed out in [JO], the reweighed Legendre polynomials 

( 12 ) Q^[x) ■.= {^ ^ {l-x^f/^L^{x) 

form a bounded orthonormal system with constant K < y/S with respect to the Chebyshev prob¬ 
ability measure dfj,{x) = 7 r“^(l — x‘^)~^^‘^dx on V = [— 1 , 1 ]. As a result, one easily obtains the 
following corollary of Theorem [2l 

^So that P [xj £ 5] = /r (5) for all measurable 5 C D and j € [m]. 

^See Theorem 12.31 in |17 |. 
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Corollary 1. Let R G he a random sampling matrix created from {Qq, ..., Qat} in 

(m via m + 1 points xq, ... ,Xm G [—1, 1] drawn independently aceording to the Chebyshev measure. 
Let € G (0,1), s G [N] \ {0}, and set R = -j^^R. If m > ‘iCe~‘^s\n^{N), then with probability at 

least 1 

k{r*sRs) < 

will hold simultaneously for all nonempty subsets S C [N] having 15| < s . The constant C > 0 is 
as in Theorem\^ 

Corollary [U guarantees that we can select a set of m +1 points from [—1,1] once for each value of 
N which will lead to a random sampling matrix for (1121) . R G all of whose associated 

submatrices Rg are nearly isometric maps from Ill'll into R™. Furthermore, R can be formed in 
0{mN)-time by using the standard recurrence relation for Legendre polynomials [l^ 

(m + l)Lm+i{x) = (2m + l)xLm{x) - mLm-i{x) 

in order to generate each row. This is a one-time computational cost for each choice of m, iV G N. 
Alternatively, in a low memory setting, one may use fast methods based asymptotic expansions in 
order to quickly generate any desired submatrix of R from Corollary [H Rs G on the fly. 

This can be accomplished in 0{m ■ s)-time for any particular such submatrix of R as needed [7]. 

2.2. Iserles’ Map from Fourier to Legendre Coefficients. For a given analytic function / 
and r G (0,1], define /^ : C —>• C by 

(13) Ux) := (1 - f Q (^-le-i- + . 

Here, when r < 1, / is evaluated on a Bernstein ellipse in the complex plane; when r = 1, / is 
composed with cos(x) as per Chebyshev interpolation. Note that fr will be both analytic (since / 
is), as well as 27r-periodic on M. Hence, both the Legendre coefficients of / : [—1,1] —M and the 
Fourier series coefficients of fr ■ [—'/r, 7 r] —)• C will decay exponentially (see, e.g., [45l|46]). In any 
such setting the following map may be constructed from fr to / (see |26] for details). 

Let {a)j be defined recursively for all a G M and j G N by 


(14) 

where (a)o := 1, and set 

1—1 

1 

+ 

rH 

1 

II 

(15) 

(2i)!j!(* + i)j 

for all i,j G N. Then, we have 


(16) 

CX) 

fii) = ^9i,j fr {-i - 2j) 

i=o 

for all i G N. Given the rapid decay of both cnj and fr when r G (0,1), one may truncate the sum 
in order to approximate the first N Legendre coefficients using 

(17) 

M 

f{i) ~ ^9i,j fr i-i - 2j), 
j=0 


for a modest M ~ log;^/^(A^), after approximating the Fourier coefficients fr{0),...,fr {—N — 2M) 
using an FFT. For r = 1 (or close to 1) a modest M can still be chosen based solely on the decay 
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of fr- The resulting numerical method requires O(A^logA^) floating point operations in order to 
approximate f{i) for all i G [A^]. 

Herein we are primarily concerned with the setting where f{x) is a polynomial of degree at most 
N (recall ([I])). In this case it is easy to verify that /r(w) = 0 for all uj < —N and r G (0,1]. Thus, 
the map ()16p reduces to the finite linear system 


(18) 


f : = 


/ /(O) \ 


V f{N) 


= Gr 


( fr (0) 


V fr i - N ) 


where G is the upper triangular matrix with entires 


(19) 


r 9 i,{j-i)l 2 if * < j, and i = j mod 2 
\ 0 else 


We are now prepared to describe our method for rapidly and accurately computing sparse Legendre 
coefficient expansions. 


3. A Simple SFT-based approach for Reconstructing Sparse Legendre Expansions 


We propose a two stage method for approximating functions, / : [—1,1] —)• M, with sparse/ 
compressible Legendre coefficient expansions as per ©• During the first stage, we use Iserles’ 
map from 1)2.2I in order to help us identify Legendre polynomials whose coefficients are large in 
magnitude in /. We accomplish this by sampling / according to its modified form, fr from (1131) . 
in order to take advantage of the fact that fr (—j) ~ f{j)/y/ffj holds for all j G [N] (see, e.g., 
Lemma [2] together with Theorem [5] in Q . This fact guarantees that the Fourier coefficients of 
fr will be compressible whenever the Legendre coefficients of /, f from (I18p . are sparse (see, e.g., 
Lemma E] in ^for details). Hence, we may utilize SFT techniques from Hi.II in order to rapidly 
identify the largest magnitude Fourier coefficients of fr which, in turn, immediately reveal the 
largest magnitude Legendre coefficients of / via (|18|) . 

We are happy; using Iserles’ map with an SFT is good enough to guarantee that one can rapidly 
identify large magnitude Fourier coefficients of fr which generally correspond to large magnitude 
Legendre coefficients of /! Unfortunately, this SFT-based technique does not appear to allow us 
to quickly compute the Legendre coefficients with much accuracy. SFTs can get us fast estimates 
that are “in the right ball park” - accurate enough to tell us that a coefficient is large - but getting 
more than a few digits of accuracy this way appears elusive. Of course, one can always force a 
good SFT method to supply Iserles’ method (flT)) with enough Fourier coefficients of fr to make it 
produce accurate estimates. However, this appears to be far too slow an approach to be terribly 
interesting for the values of r that produce decently conditioned maps Gr (i-e., for r « 1). 

Thankfully, bounded orthonormal system results for reweighted Legendre polynomials (recall 
M) can easily solve the Legendre coefficient estimation problem for us once we know which coef¬ 
ficients to compute. Given access to a well-conditioned random sampling matrix R (recall Corol¬ 
lary [T|), along with small vector y G of additional reweighted samples from / taken at 

the points Xj G [—1,1] used to build R, 


( 20 ) 



0^(1 - 

^2{m + 1 ) 


fi^j 


one can accurately estimate any given set of Legendre coefficients of /, S C [N] with < s, by 
quickly solving a small least-squares problem. In particular, this means that we can simply (f) 
identify a set of important Legendre coefficients of / with an SFT, and then (ii) use a random 







sampling matrix to accurately estimate the identified Legendre coefficients. See Algorithm [T] for 
pseudocode, and Lemma 0] in 0for more details regarding coefficient estimation. 


Algorithm 1 Fast Sparse Legendre Coefficient Expansion Algorithm 


Input: (i) Sparsity s G [N], (ii) r G (0,1], (in) Pointer to a Random Sampling Matrix, R G 
R(™+1)x(^+1), as per Corollary [TJ {iv) Renormalized samples from /, y G IR"*"''^, as per (f20]l . 
(v) Pointer to / from ([T]), and (vi) Pointer to a Sparse Fourier Transform code, SFT 
Output: A Sparse Approximation of the Legendre Coefficients of / from ([T|), 

1; Find the degrees of important Legendre polynomials present in /, 5 C [A^j with |5| < s, by 
running an SFT on fj. from (|13p and recording the < s most energetic frequencies it returns. 


2 ; Approximately Solve a Least Squares Problem: Ri Z m^n 


arg min^g]j^|5| 


Rs ^ - y 


2' 


The runtime complexity of Algorithm [T] will generally be largely determined by the type of SFT 
chosen in line 1. Both randomized and deterministic algorithms exist. The randomized approaches 
are generally faster, but have a small (usually tunable) probability of failing to return a good answer. 
The deterministic approaches are slower, but are guaranteed to approximate a given function as 
well as is possible with a sparse representation of the chosen size. Recall Theorem [1] in 1)1.11 for 
example results. 

Considering the runtime complexity of line 2, we note that we may efficiently solve the least 
squares problem there using a Conjugate Gradient (CG) algorithm. Suppose that the normalized 
random sampling matrix, R G passed to Algorithm [1] satisfies (jlll) of Theorem [2| 

with e = 3/5. In this case, a CG method will allow one to use just 


Clog 


+1 


|y||2 


< Clog 


+1 


y«(«s«s)-i 

CG iterations in order to get 

( 21 ) 


|y||2 


= CGoga 


|y||2 


Rs 


^min 


< 6 . 


for any desired <5 G ]R“'" (see, e.g., Ghapter 7 of [5]). Here, Gorollary [T] (i.e., (fTTIP has been used to 
bound K (R^Rs) under the assumption that m = C'sln^{N) for appropriate fixed universal con¬ 


stants C, C' G IR'^. Each GG iteration then takes O (s^ ln'^(A^))-time. Noting that ||y||2 < C" 
will also hold, for a universal constant C" G IR'^, whenever R satisfies (llip (see, e.g.. Exercise 6.6 i 
[HI), we have that we can compute an G IRI*^! satisfying (12111 in O I In^(A) • In 


-time. 


4. Error Analysis and Recovery Guarantees 

In this section we analyze Algorithm [TJ The main results establish both that (i) the largest mag¬ 
nitude Legendre coefficients present in / can be rapidly identified via SFT methods (see Theorem [5]), 
and that (ii) once identified, the largest magnitude Legendre coefficients can be both rapidly and ac¬ 
curately approximated (see Lemma 0]). By combining these results one can establish deterministic!! 
sublinear-time recovery guarantees for many different classes of Legendre-compressible functions. 

o 

Note that we are implicitly using randomized techniques to construct the random sampling matrices, R G 
1 R(™'+i)x(^+i), used in line 2 of Algorithm [T] However, the related probabilistic guarantees establish results for 
all sufficiently sparse signals with high probability, and so can be viewed as establishing the existence of entirely 
deterministic methods. 


9 


























For example, one can easily prove sublinear-time recovery guarantees for exactly s-sparse Legendre 
polynomials of the form 

(22) f{x) = ^ f{n)Ln{x) 

neSc[N] 

where |5| = s, min5 = Q{N/s), and all s nonzero /(n) £ IR have (roughly) the same magnitude. 
Doing so we may obtain, e.g., Theorem [3l 

Theorem 3. There exists a deterministic O (s® log^(A^))-time algorithm that is guaranteed to ex¬ 
actly recover (up to machine precision) the Legendre coefficients of any function of type (l22]) . 

Proof: Apply Corollary [3] followed by Lemma [H □ 


Note that the runtime of the deterministic algorithm referred to by Theorem [3] is indeed sub- 
linear in N for sparsities s N. However, it is also almost certainly suboptimal - algorithmic 
modifications can probably be made that reduce the runtime complexity further without negatively 
impacting the recovery guarantee. 

It is also important to point out that the current assumptions concerning / in (j22p can be loosened 
considerably, without loosing deterministic recovery guarantees, by applying the subsequent results 
differently than done to get Theorem [3l However, the theoretical results thus derived suffer both 
aesthetically and, in other ways, technically. For this reason we will leave the proof of alternative 
guarantees via Theorem [5] and Lemma 0] to the interested reader. 

We will now begin to prove our main theoretical results, starting with those concerning the rapid 
identification of the Legendre polynomials whose coefficients are largest in magnitude in /. 


4.1. Support Identification. For the purposes of analyzing line 1 of Algorithm [T] it is crucial to 
understand how sparse 


f' 


/ fr ( 0 ) \ 


V fr i-N) 


will be given that f is sparse (recall (jlSp ). We will begin to move toward this goal by considering the 
matrix G~^. Once it is properly understood, we will then be able to consider the compressibility 
characteristics of for sparse vectors f. The following lemma gives the entries of Gf^. 


Lemma 1. The even rows of the inverse matrix G^ ^ from (I18p are given by 


(23) 


0 


j < i 



(-iL (-l)fc r-^i(2j+2ky. (2k\ l+2i 

2-jk=i 4^^ {j—k)\{j-\-ky.{2ky. \k-\-i) fc+z+l 




and [G„^] =0 for i,j = 0,..., I ^ I. Similarly, the odd rows of the inverse matrix G„ ^ from 

V /2i,2j-ei 

(HD are given by 


(24) 


0 


j < i 



2i+l,2j+l 




(-ly (-l)fc r--2»-l(2j+2fc+2)! ^2fc+l^ 2+2i I A I 

2-4J A^k=i 2-4^ {j—ky{j-\-k-\-iy(2k-\-iy V k—i ) k~\-i-\-2 — L 2 J 


and =0/ori,j = 0,..., LfJ. 

V J 2i+l,2j ^ 
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Proof: See Appendix!^ 


□ 


The following corollary of Lemma [T] establishes simpler and more useful formulas for each entry 
of 

Corollary 2. The even rows of the inverse matrix from (I23p may also be written more com¬ 
pactly as 


(25) 


G. 


-1 


= 


2i^2j 


j < i 


(2i+2i)! 


-2h2i+i) r(|) 

(i+i)! (i+i)! i+j+l (j-i)\ r(i-j+|) 




and ( G. 


=(-i 


2 , 2 , + . LtJ- 

may be written as 


Similarly, the odd rows of the inverse matrix G^ ^ from 


(26) 


0 


j < i 



2i+l,2j+l 


(-ly+J (2(i+i+l))! r-^^-l(2i+2) r(|) ,•<,■< I TV I 

4»+J+l (i+j+1)! (i+i+1)! i+i+2 (j-i)\ r(i-i+|) — J — L 2 J 


and (g^ 2 i = 0 b-? = 0,..., . 

Proof: See Appendix [Bj □ 


With CorollaryOin hand, one may now see that each row and column of Gj. ^ is weakly dominated 
by it’s diagonal entry. See Theorem 0] below for an exact statement. 

Theorem 4. For nonzero entries of the inverse matrix Gf^, the decay rate of each row is given by 

for X > 2, 
for X = l,x = 2. 

The decay rate of each column is given by 


(27) 




n+2 


<1 


(Gr )n,n+2a; 




and 


< 




2^/tt XyJx — 2 


{Gf^)n, 


n+4 


< I 


(G 

(G 


-i'. 


r ')n,n 


(28) 


(G 


-1^ 


)n—2x,n 


{Gf^)n-2,. 


< 


{G 


-1^ 


, and 


< Jd 


~ \/27r Xs/x—2 


(G; 


-1^ 


jn—4,n 


< 


(GL^)n,n 

(G-^)n,n 


The diagonal entries of G^ ^ satisfy 


(29) 


/7rn 


1 - 


8re J 


< 


(g; 


r )n,n 


< 


vrn 


for n > 0, and 


for 2 < X < |_|J , n > 6 
for X = l,x = 2,n > 2x. 


(G 




lo,o 


= 1 . 


Proof: First, let’s simplify the term T (|) / ((j — i)!r (i — j + |)). Since r(3/2) = and by 

the reflection formula for the Gamma function (see, e.g., |47]). r(—z) = —vr/ (zr(z) sin(7rz)) for 
z ^ Z, we have 


r(|) (_i).-+i(j_i_|)r(j-i-|) 

{j - i)!r (i - j + I) 2v4F (i - i)\ 
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According to the half integer argument for the Gamma function, r(n/2) = ((n — 2)\\^/^T) 
for n S where n!! = {2k)\/ (2^A:!) when n = 2k — 1 for k £ Thus, 

r(i) (-i)i-m (j_^_3^(2j_2j-5)!! 

(j-i)!r(z-j + |) 2 U-i)\ 2^-*-2 

(j-i-l) (2(j-^-2))! 

2 • 4i-*-2 (j -i)[j -i - 1) {j -i- 2)!(j - i - 2)!■ 


By Stirling’s approximation, n\ = \/2vrnn”e ”e^", where 0 < < l/12n (see, e.g., [SKIT]), 

r(|) (-i)i-+i 


(30) 


U-i-l) 


(j - i)!r (i - j + I) 2 V^ (j - ^)(j - i - ^)Vj - ^ - 2 

Now by Corollary [2] and (l30l) . if n = 2i for i,x € Z, i > 0, and x > 2, then 




'j — i — 2 


(31) 


(Gr )n,n+23: 


((5; 


r }n,n 


1 (n!)^ (2n+2x)! n+1 1 


4^ (2n)! (n+3;)!(n+3;)! n+x+1 2^ x{x—l)\/x—2 

Using Stirling’s approximation again for i > 1 we have 


qR2{x-2)—'^Rx-2 


(32) 


{Gj. )n,n+2x 


(G 


r Jn,n 


Vn n+1 U-|) 1 1 

/n+x n+x+1 x—1 x\/x—2 2.%fK 


where = 2i?„ — R 2 n + R 2 n+ 2 x — ‘^Rn+x + R 2 {x- 2 ) ~ ‘^Rx -2 < 1/3 for all n,x G Z'*' with x > 3. 
According to when i = 0, it is easy to show that 


(33) 



< 


(G 




lo,o 


^ 1 1 

27r (x + 1) x3/2-y/x — 2’ 


for X > 2. 


Similarly, if n = 2i + 1 for i > 0 it is not difficult to verify that (j32p still holds. Therefore, one can 
see that each row satisfies 


(Gj- ')n,n+2x 


< 


2v^ Xy/x — 2 


(g; 


-1^ 


The remainder of the proof of (|27p is now easily established using Corollary [2j 

Similarly, we can bound the rate of decay of each column off of the diagonal. By Corollary [2l 
(l30l) . and Stirling’s approximation, if n = 2i for i > 3, 2 < x < , then 


(Gj, ')n—2x,n 


(/^—1\ . / / n '\ n—2x+l (^_g.'i 

t r >n,n 20F n—xj n—x+1 x—1 J x\Jx—2 


where IR = 2Rn — i?2n + R 2 (n-x) ~ ‘^Rn-x + R 2 (x- 2 ) ~ ‘^Rx -2 < 1/3 for all n,x £ Z'*' with x > 3. 
Since y/nj{n — x) < -\/2 for 2 < x < z. 


(34) 


{Gj. )n—2x,n 


< 


(G 


-i\ 




For n = 2z + 1 an analogous calculation reveals that (I34p still holds. The remainder of the proof 
of (1^ is now easily established using Corollary [2l Lastly, the proof of (l2^ is easily established 
using Lemma [T] together with Theorem 2.6 in |44] . □ 


We are now in the position to begin studying the compressibility of f/ = G^ in terms of the 
compressibility of f. Let a : [A^] —)■ [A^] be a permutation of [A^] such that 

fa{j) > fcr{j + l) 
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holds for all j G [A^ — 1]. Let p : IR —)• U { 00 } be a modified ramp function with p{x) = x for 

all X G IR'*', and p{x) = 00 for all x < 0. Finally, define the right-distance form j G [A^] to the set 

|cr(n) n G [s — 1], cr(n) = j mod 2| to be 

Xcj(n) - j) 


ds{j) := min 


n G [s — 1], cr{n) = j mod 2 > U { 00 } 


We now have sufficient notation to consider the sizes of specific entries of f'. 
Lemma 2. Let s G [A^] \ {0}. We have that 




< 


31 


(G 


r )j,j 


fa{s) 




2X \ds{j)y/ds{j) - 2^ 


(G 


r )j,j 


holds for all j G [N] with ds{j) > 2. 

Proof: Using that with upper triangular as per Lemma [H we have that 

= iGf^)j,jfj + ^ {Gr^)jj+2lfj+2l- 


1=1 


Rearranging this expression we can see that 

dsij)-l LX-J 

'I U r I < ^ (G^^)jj+2Z fj+2l + ^ (G^^)jj+2Z fj+2l 




< 


1=1 


fa{s) 


Us{j)-1 

\{Gf^)j,j+2i 


l=ds{j) 




(. 


1=1 


2X \ds{j)y^ds{j) - 2^ 


(G-'k 


where the second inequality follows from Holder’s Inequality, Theorem^ and the definition of ds{j). 
Appealing to Theorem 0] again we can also see that 
dsU)-^ 


E \w\ 




< 


(G-^) 


J,3 


1=1 


11 X , ke 


2 8 6^/^T ' 2-/7r ^3 xy/x - 2 




I 


dx 


The remainder of the proof now follows. □ 

We can now begin to understand the compressibility of In particular, we have the following 
result concerning the best s ■ k approximation to f( for any k G Z'*' with k > 5. 

Lemma 3. Let s,k £ [A^] with k > 5, s > 8, and nmin := min |(T(n) n G [s — 1]| > A:. Then, 

Opt 




s-k 


< 7Vn 


fa{s) 


+ 


■sllflli 


Proof: We bound the sum of the magnitudes of all entries in whose indices have right-distance 
> k from |cr(n) n G [s — 1]| using two cases: 

Case I. We bound the magnitudes of the small entries with j < Umin ~ k. Using Lemma [2] we 


can see that 

n-min-k 

E 

j=0 


^min k 

< E 

j=0 


h] 


51 


fa(s) 


'JTll‘l Y (zT-min J)k^min J 4^ 
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Theorem 0] now implies that 

Tlrrtirt k 


E 

j=0 


< 


< 


51^2^7^ 

IOi/tt 

51^2^ 


1O0F 

^ 4.1-^nmin fa{s) 


f(7(s) 

fa{s) 


+ 




-fc 


nmin 1 


TT 


+ 


2||f||i 

H-7 + 

^min 1 

4||f||i 

■N/^min 4\/A; 5 


VJ(ramin - j - 4)2 


TT 


I 1 


\fx (nmin - 4 - x) 


3/2 


Case II. Here we bound the sum of the magnitudes of all entries yfij with j ^ A := {x \ x > 

rimin and ds{x) > k}. Note that there can be at most (s — l)k entries in [A^] \ (H U [rimin])- Using 
Lemma m Theorem 01 and definition of A we can see that 


E 

jeA 


< 


< 


51 


N 


2O0F 

51v^ 

lO'v/vr 


fa{s) 


fa{s) 




E 


U=^min + l^/ \^k ~ ^ 


+ 


e s 


-1) l|f| 


^ s/'^min y/k — 3 


Combining the bounds from Cases I and II now finishes the proof. 


□ 


Note that the vector contains only the (potentially nonzero) negative Fourier series coefficients 
of /i(x) = (l — / (cos(x)). Let 

/ A (iV + 2) \ 

?! := A (0) G 

V A i-N) 

consist of all the potentially nonzero Fourier series coefficients of fi{x). Noting that /(cos(x)) is 
an even real-valued function, one can see that A (a^) = ~fi (2 — w) holds for all ui G Z"*" (with 
A (1) = 0). As a result, Lemmas [2] and [3] trivially extend to A. Using this fact in combination 
with results from [28] finally allows us to prove the main result of this section. 


Theorem 5. Let s,k G [N] with k > 5, s > 8, S := |cr(n) n G [s — 1]|, and := min 5 > k. 
Given j G S, define the new right-distance from j to S to be 
' p {a(n) - j) 


d'sU) ■= min 


n G [s — 1], cr(n} = j mod 2 > \ {0} U {oo} I > 0 = ds{j) 


Then, the deterministic variant of the algorithm referred to by Theorem 0] will recover all j G S 
satisfying both 

17y/N\\i\\i 


(35) 




^ /224:yG^N 31 

7 s ■ k ^ W 


A- 


A(s) 




2fc \/(Umm - A) (k-5)’ 


and d'fij) > k. Its operation count will be 

O (s^ • k"^ • log^ N) 
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Proof: We consider the behavior of the deterministic variant of the algorithm referred to by The¬ 
orem [1] when executed on fi{x) = (l — f (cos{x)) with sparsity parameter 2s and e = l/k. 
In the course of forming its output trigonometric polynomial, y*, this algorithm is guaranteed to 
identify every j G [—N, N + 2] Ci'Z with the property that 


fi U) 


> 



^ /^ \ opt 


2 

fl- (fl 



\ / 2s-k 

1 


s ■ k 


by Lemma 6 in [28]. 

Suppose that fj satisfies (155]) and also has d'g{j) > k. A trivial variant of LemmaOthen implies 
that 


h{-j) > (G-i) 




31 


i^r )j,j /(t(s) 


Using (|35]l and that > k one can now see that 


hi-j) > 


28 • SyTFAl 
7s ■ k 


fa{s) 


20F 

32V^||f||i 


(G-i) 






Theorem U] followed by Lemma [3] finally implies that 



28 • y/N 

_ 

fl (-j) 

s ■ k 

fa{s) 


7/cy^(nmin - 4) (A: - 5) 

4||f||i 


s ■ k 


( uVn 


fa{s) 


ky/ (remin - 4)(A; - 5) 
2s||f||i 


-b 


\/ ^min 4\/fe 5 


> 



^ /^ \ opt 


2 

fl - (fl) 



\ / 2s-k 

1 


s ■ k 


This guarantees that j will indeed be identified as claimed. 


□ 


The final corollary of this section applies Theorem [5] to the situation of exactly s-sparse vectors 
f. We have the following result; 


Corollary 3. Lets G [A^] with s > 8, S := |cr(n) 


n G s — 


1]}, 


and nmin := min 5 > 


N 


+4> 


17s/2-5 

17s/2. Furthermore, suppose that f is exactly s-sparse with /^-(o) = fa(s-i) > fa{s) = 0- Then, 
there exists a deterministic algorithm which will recover a set of cardinality O (s^) that is guaranteed 
to contain S as a subset. Its operation count will be O (s^ • log^ Al). 


Proof: Apply Theorem [5] with k = 17s/2. The deterministic variant of the algorithm referred to 
by Theorem [1] will recover all j G 5 satisfying both d'g{j) > 17s/2, and 

Sa/ (ramin - 4)(17s/2 - 5) sV (rimin - 4)(17s/2 - 5) s 

as a subset of a set A of cardinality 0{s‘^). Returning all j G [N] whose right-distance to A is 
< 17s/2 provides a set of cardinality O (s^) that is guaranteed to contain 5 as a subset. The result 
follows. □ 
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We conclude this section by mentioning several obvious facts regarding Theorem [5] and Corol¬ 
lary [3l First, we have no doubt that they can be improved in general. The easiest way to do this 
is to utilize randomized SFT methods in place of the deterministic algorithm from Theorem [1] in 
order to reduce the computational complexities involved. It is also clear that the assumptions in 
Corollary [3] can be relaxed rather easily at the price of increased computational costs. Less trivial 
improvements would probably revolve around developing better variants of /i(x) whose Fourier 
coefficients are less contaminated by Legendre coefficients that are “far away” from the associated 
ones (i.e., G~^ should be “more diagonally dominant”). Alternatively, one might design efficient 
filtering schemes which gradually reveal the lower (generally more energetic) frequencies of fi{x) 
so that they do not overwhelm larger (generally less energetic) frequencies as we hunt for them. 
However, we will leave such considerations for future work. We are now ready to consider how ac¬ 
curately we can estimate the Legendre coefficients for the set of supporting polynomials, S C [A^], 
we identify in this section. 


4.2. Coefficient Estimation. Estimating the Legendre coefficients for the polynomials that have 
been identified as present in / is comparatively easy given all the previous work on bounded 
orthonormal systems. We have the following result: 

Lemma 4. Suppose that the normalized random sampling matrix, R G passed to 

Algorithm{J\ satisfies (fTTI) of Theorem\M with e = 3/5. Let S G IR'*', and S C [N] have cardinality 
|5| = s. Then line 2 of of Algorithm\J\ ean produee an s-sparse vector, z G satisfying 


in O [ s^ In^(A^) • In 


f — z 

■time. 


< 5 


f^c 


-b 


f^c 




-b 


Proof: Let f^' G Ill'll be such that Rsig = Ris (i.e., let f^' be £5 with all it’s zero-valued entries 
indexed by removed). Let G Ill'll be defined as in line 2 of Algorithm [H We have that 

By CG discussion (1^ in ^ 

-b RLs^ + S) Since y = i? -b 


fs-fs 


< 

< 

< 

< 


\/ i — e 
ms Zmin Rsf, 


S Il2~ 


1 


2 

- vT=b 

^ 2\/T+e 

- vT^ 


a-e 

Rs'^min y 
5 


% 


2 + 

2 ^ ^/i I ^ n/We 


By the definition of Zmin 
Using Exercise 6.6 in m 


One can now (implicitly) form z = Z 5 from fc in the obvious fashion. Doing so we learn that 


f — z 


< 


Z5 - fs 


-b 


f^c 


< 1 + 


2 \/l -b 


f^c 


2^/l -b e 
2 y/l-e 


f^c 


-b 


■\/s y/1 — e 


Vl - e 

The total runtime complexity follows from the discussion regarding line 2 of Algorithm [ 1 ] in Sec¬ 
tion [3l □ 


We are now prepared to test a particular version of Algorithm [T] numerically. As we shall 
see, the experiments demonstrate (as a proof of concept) that SFTs can be used to build stable 
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sublinear-time algorithms capable of rapidly computing Legendre coefficients whenever they exhibit 
compressibility. 


5. Empirical Evaluation 


We now present representative results demonstrating the numerical robustness and efficiency of 
the proposed SFT-based Legendre method. For the experiments in this section we used FFTW 3.3.4 
to implement Iserles’ scheme [26j for the purposes of comparison (see (1171) in 112.2|) . FFTW3 [IB] 
is a highly efficient implementation of the “standard” FFT algorithm - it has been systematically 
optimized over the course of the last two decades and remains one of the fastest freely available 
FFT implementations available today. The parameter r in (I13p was chosen to be 1 — 10“® for all 
experiments. The qualitative behavior for other values of r sufficiently close to 1 is similar. The 
number of terms, M, in (I17p was varied differently in each experiment, as indicated below. 

Algorithm [T] was implemented using AAFFT [29j as the sparse Fourier transform, followed by 
a conjugate gradient cod^ in order to compute the necessary Legendre coefficients. Although less 
optimized than some other SFT implementations, AAFFT’s code is well documented, readable, 
and easy to modify. All code used to perform the experiments below is freely availablejj 

Every data point in the first two figures below is the result of 100 trials performed on 100 different 
randomly generated polynomials, 

(36) f{x) = ^ amLm{x), 

m£S, 151=5 

where S C [A^] contains s entries independently chosen uniformly at random from [N], and each 
o-m £ {~1) 1} is independently chosen to be ±1 with probability 1/2. Figure [1] reports runtime 
results averaged over the 100 independent trials at each data point for Algorithm [1] versus the 
method outlined in ^2.21 All runtimes are reported in tick counts using the “cycle.h” header 
included with the FFTW 3.3.4 library. Tick counts correspond closely to the number of CPU cycles 
used during the execution of a program and, therefore, accurately reflect each implementation’s 
comparative computational complexity. As one can see. Algorithm [1] is faster than the method 
outlined in ^2.21 when s < 



(a) Runtime with N = 2^^ as sparsity, s, varies. 



Figure 1. Runtime comparison between Algorithm [T] implemented with AAFFT 
as the SFT, and the approach from [26| implemented with FFTW as the FFT. 

AAFFT is an implementation of a randomized Fourier method with a user-tunable probability 
of failure (see Theorem [T] for a similar probabilistic SFT recovery guarantee). The runtime results 

^The conjugate gradient code is available at http://people.sc.fsu.edu/~jburkardt/cpp_src/cg/cg.html 
®A11 code is available at http://www.math.msu.edu/~markiwen/Code.html 

®Using a faster SFT implementation would doubtlessly produce faster results for Algorithm [T] - AAFFT is the 
computational bottleneck here. 
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for Algorithm [T] in Figured] were produced with settings for AAFFT which guaranteed Algorithm [T] 
to have an £^-error of size < 10“^ on the s true Legendre coefficients, am with m G S, for more 
than 70 of the 100 trials used to generate each data point. More detailed information concerning 
approximation errors for these experiments is reported in Figure [23 Although highly accurate for 
small values of N (see |26j ), the method outlined in 1)2.21 has a relatively low accuracy for the large 
degree polynomials considered herein, achieving only two or three digits of accuracy per Legendre 
coefficient on average. Figure [23 compares its average £^-error on the true Legendre coefficients, am 
with m G S, over all 100 trials at each data point against Algorithm [TJs average £^-error over the at 
least 70 trials at each data point for which it correctly identified a superset of S from (lOOp . As one 
can see. Algorithm [1] is generally more accurate when it manages to identify S. The error graphs 
for the other values of N considered herein were similar. For example, the method outlined in 1)2.21 
had an average .£^-error that was always less than 0.05 for the experiments reported in Figure [Tb] 
at each N, while Algorithm [TJs average £^-error was always < 10“® for these experiments whenever 
it found S (for more than 70% of the trials for each data point). 


10” 

10“” 

^^-Error for N = 2“ 




SIO- 


— FFTW + (17) 



•••Algorithm 1 

~10““ 


- 

10"® 




5 10 15 20 25 30 35 40 45 50 

Sparsity 


(a) Average error for N = 2^^ runtime experiments 


Samples Used by Algorithm 1 for N = 2^^^ 



Figure 2. Additional information regarding the runtime experiments in Figure [13 
The number of terms, M, used in 017)) for Figure [23 was chosen separately from 
the range [1, 25] for each of the 100 trials in order to give the lowest possible error. 
The best value of M was usually 1, however. Choosing M to be much larger did 
not usually help. Figure I2bl plots the average number of evaluations of each trial 
polynomial /, as a percentage of N, that Algorithm [T] used in order to produce the 
results plotted in Figures [23 and [13 


Figure [3] reports the results of some additional experiments on numerical accuracy, stability, and 
robustness to noise. For these experiments both the maximum degree, iV, and the sparsity, s, of 
the trial functions were fixed. In addition, the trial functions were modified so that each one was 
of the form 

(37) f{x) = Yj E &mLm(x), 

m£S, |S|=s m£[N]\S 

where (i) S C [N] contains s entires independently chosen uniformly at random from [N], (ii) each 
am £ {~li 1} is independently chosen to be ±1 with probability 1/2, and {Hi) each bm is an i.i.d 
mean 0 Gaussian random number generated numerically via the Box-Muller method. As above, 
each data point in Figure [3] is the result of 100 trials performed on 100 independently generated 
polynomials of this form 037p . 

For the experiments in Figure [3] Algorithm [1] had its parameters set so that it would compute 
all Legendre coefficients to at least 8 digits of precision for at least 90% of trials in the noiseless 
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Noise Tolerance for N = 16384 and Sparsity s = 50 
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Figure 3. Robustness to approximate sparsity. The horizontal axis varies over ten 


signal-to-noise values given by log;^o 


Eme[iV]\S 


= log 


10 VE 


50 


^me[JV]\S 


The 


vertical axis graphs the average £^-error of each method. 


setting (i.e., when (l36]l holds). The log signal-to-noise ratio, 
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was then varied by renormalizing the i.i.d. bm’s generated for each trial. The resulting log signal- 
to-noise ratios appear on the horizontal axis of Figure[3j The vertical axis plots the average £2.error 
on the true Legendre coefficients, am with m £ S, over all trials at each noise level of the method in 
112.21 against Algorithm [TJs average £^-error over the at least 90 trials at each noise level for which 
it correctly identified a superset of S' C [N] from ([37)1 . The number of terms, M, used in (fT7t) for 
the 112.21 method was again chosen separately for each of the 100 trials in order to give the lowest 
possible error, except that here M G [1,45]. The best value of M was, as before, usually 1, however. 
Increasing M still did not usually help decrease the error. The qualitative behavior for other values 
of s and N was similar. As above, we conclude that Algorithm [T] is generally more accurate than 
the original ^2.21 method provided that it correctly identifies (a superset of) the support set S. 


6. Conclusion 

In this paper we have demonstrated that SFT techniques can be used to help rapidly approximate 
functions with sparse Legendre coefficient expansions. Together with the problems already relegated 
to future work, we believe that it would be interesting to consider extending these methods to 
functions which exhibit sparsity in other types of Gegenbauer polynomial expansions. Given the 
level of success in both the Chebyshev and Legendre settings it seems likely that SFTs can be 
utilized to good effect more generally. 
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Appendix A. Proof of Lemma [T] 


Assuming that / : [—1,1] —)• ]R is a Legendre polynomial of degree N, we can begin to compute 
fr- This will, in turn, reveal the entries of G~^ from (I18|) . Recall that 




Setting z := -— and expanding the Legendre polynomials in terms of the canonical poly¬ 

nomial basis (see, e.g., m) we obtain 
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Note that even powers of z will only ever contain even powers of both r and r<E^^. Similarly, 

odd powers of z will only ever contain odd powers of and Thus, it we can see that 
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A.l. The Even Fourier Coefficients. Expanding /e(x) from ()40l) using the Binomial Theorem, 
reindexing, and then changing the order of summation, we obtain 
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Multiplying the (l — factor through the sum above and recombining terms now yields 
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Recalling from (I18|) that we are primarily concerned with I = 0, —1,... , — [yj in the expression 
above, we can now recombine terms in the bracketed sum to obtain the relevant even Fourier series 
coefficients of fr from /e (please note that, in fact, that the last line of our calculation above does 
not technically hold as written unless I < 0!). Doing so, we learn that 
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for I = 0, —1,..., — + 1. This combined with (I42|l gives all entries in the even rows of G. 
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A.2. The Odd Fourier Coefficients. Expanding fo{x) from (I4ip using the Binomial Theorem, 
reindexing, and then changing the order of summation, we obtain 
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for some Cq, ..., C'|-jv_^-| G IR with which we need not concern ourselves at the moment. Multiplying 
the (l — ®^“*) factor through the sum above and recombining terms now yields 
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We can now recombine terms in the bracketed sum to obtain the relevant odd Fourier series 
coefficients of fr from /q. Doing so, we learn that 
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for Z = 0, — 1,..., — — 1 ] +1. This combined with (11^ gives all entries in the odd rows of 


Appendix B. Proof of Corollary [2] 

We will once again consider the even and odd rows separately. 

B.l. The Even Rows. From ()23p in Lemma [U the nonzero entries in the even rows of the inverse 
matrix G~^ for i < j < can be rewritten as 
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The hypergeometric function F can be expressed by 
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where m is a positive integer and 7 is neither zero nor a negative integer (see, e.g., [37]). Therefore, 
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The special value of F{a, f3;'y, z) at z = 1 can be expressed in terms of a Gamma function [37]. 
That is, F{a, j3] 7 ,1) = (r( 7 )r (7 — a — 13)) / (r (7 — a)r (7 — /?)). Thns, we deduce that 
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B.2. The Odd Rows. Similarly, from (I24|) in Lemma [H the nonzero entries in the odd rows of 
the inverse matrix G~^ for i < j < can be rewritten as 
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where T is a hypergeometric function. Again, by using a Gamma function for the special value of 
F at z = 1, we hnally deduce that 
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